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Abstract 

Cancer, which is a leading cause of death worldwide, places a big burden on health-care system. In this study, an order- 
prediction model was built to predict a series of cancer drug indications based on chemical-chemical interactions. 
According to the confidence scores of their interactions, the order from the most likely cancer to the least one was obtained 
for each query drug. The 1 st order prediction accuracy of the training dataset was 55.93%, evaluated by Jackknife test, while 
it was 55.56% and 59.09% on a validation test dataset and an independent test dataset, respectively. The proposed method 
outperformed a popular method based on molecular descriptors. Moreover, it was verified that some drugs were effective 
to the 'wrong' predicted indications, indicating that some 'wrong' drug indications were actually correct indications. 
Encouraged by the promising results, the method may become a useful tool to the prediction of drugs indications. 
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Introduction 

Cancer is the main cause of death in both developed and 
developing countries [1]. In 2008 alone, there were 12.7 million 
new cancer cases and 7.6 million cancer deaths worldwide [1]. 
Meanwhile, the quantity of newly approved drugs diminished 
continually in spite of an increase of R&D investments [2] . R&D 
of a drug requires comprehensive experimental testing, which 
often costs millions of dollars, involves several thousand animals, 
and takes many years to complete. However, as a result, not many 
chemicals have undergone the degree of testing needed to support 
accurate health risk assessments or meet regulatory requirements 
for drug approval. Thus, it is very attractive to develop quick, 
reliable, and non-animal-involved prediction methods, e.g. using 
structure-activity relationships (SARs), to predict the anticancer 
activities of chemicals. 

Some pioneer studies indicated that interactive proteins are 
more likely to share the same functions than non-interactive ones 
[3,4,5]. Likewise, interactive compounds are also more likely to 
share common properties [6,7,8]. STITCH (Search Tool for 
Interactions of Chemicals, http://stitch.embl.de/) is a well-known 
database containing the interactions information of proteins and 
chemicals [9,10]. It provides three categories of interactive 
compounds: (1) those participating in the same reactions; (2) those 
sharing similar structures or activities and (3) those with literature 



associations, such as binding the same target [9] . In this study, we 
attempted to build a prediction model of drug-indication by 
quantifying chemical-chemical interaction of every pair of 
interactive compounds. Briefly, drugs and their corresponding 
indications (i.e., 8 kinds of cancers) were extracted from KEGG 
(Kyoto Encyclopedia of Genes and Genomes, http://www. 
genome.jp/kegg/) [11], a well-known database dealing with 
genomes, enzymatic pathways, and biological chemicals, and 
Drugbank [12], another database containing detailed information 
of drugs and their target information. Then, the score of each 
indication of the query compound was obtained from the 
confidence scores of interactions between the query compound 
and its interactive compounds using the indications of the 
interactive compounds. And the order from the most likely 
indication to the least was obtained for each drug. Finally, the 
prediction quality of the model was evaluated by Jackknife test and 
some other parameters. 

In addition to build an effective prediction model, another aim 
of our study is to investigate the drug repositioning ability of our 
model. Drug repositioning, i.e. finding novel uses of existing drugs, 
is an alternative strategy towards drug development because it has 
the potential to speed up the process of drug approvals. Several 
drugs, such as thalidomide, sildenafil, bupropion and fluoxetine, 
have been successfully repositioned to new indications [13,14]. 
Experimental approaches for drug repositioning usually employ 
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high throughput screening (HTS) to test the libraries of drugs 
against biological targets of interest. More recently, several in silico 
models were developed to address the issues of drug repositioning. 
Iorio et al. predicted and validated new drug modes of action and 
drug repositioning from transcriptional responses [15]. Butte's 
group reported two successful examples of drug repositioning 
based on gene expression data from diseases and drugs [16,17]. 
Cheng et al. merged drug-based similarity inference (DBSI), target- 
based similarity inference (TBSI) and network-based inference 
(NBI) methods for drug-target association and drug repositioning 
[18]. In our study, according to the assumption that interactive 
drugs are more likely to target the same indication, we investigated 
the repositioning possibility of some 'wrong' predicted drugs by 
retrieving references, and attempted to propose alternative 
indications for some drugs. 

Materials and Methods 

Materials 

The information of 98 drugs that can treat cancers was retrieved 
from KEGG DISEASE in KEGG [11]. These drugs can treat the 
following 10 kinds of cancers: 

(1) Cancers of the nervous system 

(2) Cancers of the digestive system 

(3) Cancers of haematopoietic and lymphoid tissues 

(4) Cancers of the breast and female genital organs 

(5) Cancers of soft tissues and bone 

(6) Skin cancers 

(7) Cancers of the urinary system and male genital organs 

(8) Cancers of endocrine organs 

(9) Head and neck cancers 

(10) Cancers of the lung and pleura 

Since some drugs have no information of chemical-chemical 
interactions, we discarded these drugs, resulting in 68 drugs. After 
that, we found that 'Skin cancers' and 'Head and neck cancers' 
only contained 3 and 4 drugs, respectively. It is not sufficient to 
establish an effective prediction model with only a few samples, 
thus these two kinds of cancers were abandoned. As a result, 68 
drugs were obtained, comprising the benchmark dataset S. These 
68 drugs were classified into 8 categories in a way that drugs that 
can treat one kind of cancers comprised one category. The codes 
of the 68 drugs and their indications can be found in Table SI. 
The number of drugs in each category is listed in column 5 of 
Table 1. For convenience, we used tags C\,Ci, . . . ,C% to 
represent each kind of cancers. Please see the column 1 and 2 of 
Table 1 for the corresponding of tags and cancers. It is observed 
from Table 1 that the sum of the number of drugs in each 
category is much larger than the different drugs in S, indicating 
that some drugs belong to more than one category, i.e. some drugs 
can treat more than one kind of cancers. In details, 50 drugs can 
treat only one kind of cancers, while 18 drugs can treat at least two 
kinds of cancers. Please refer to Figure 1 for a plot of the number 
of drugs against the number of cancers they can treat. Thus, it is a 
multi-label classification problem which needs to assign each drug 
to the aforementioned 8 categories in descending order. The 
classifier only providing one candidate cancer that a query drug 
can treat is not an optimal choice. Similar to the situation when 
dealing with proteins and compounds with multiple attributions 
[7,19], the proposed method also needs to provide a series of 
candidate cancers, ranging from the most likely cancer to the least 
likely one. 



To better evaluate the proposed method, the benchmark dataset 
S was divided into one training dataset S^. and one validation test 
dataset S, e , i.e. S = S tr US t<; and S tr nS te = 0, where drugs that can 
only treat exact one kind of cancer and half of drugs that can treat 
at least two kinds of cancers comprised S tr , while S t( . contained the 
rest drugs in S. The number of drugs in each category for S tr and 
S te is listed in column 3 and 4 of Table 1, respectively. 

In addition, to test the generalization of the proposed method, 
we extracted 59 drug compounds from Drugbank [12], which are 
not in the benchmark dataset S. After excluding drug compounds 
without information of chemical-chemical interactions, 44 drugs 
were obtained, comprising the independent test dataset S; te . The 
number of drugs in each category of S; te is listed in column 6 of 
Table 1 and the detailed information of these drug compounds 
including their codes and indications can be found in Table S2. 

Chemical-chemical Interactions 

In recent years, the information of chemical-chemical interac- 
tions is penetrating into the prediction of various attributions of 
compounds [7,8,20] . The basic idea is that interactive compounds 
are more likely to share common functions than non-interactive 
ones. Compared with the information based on chemical 
structure, it includes other essential properties of compounds, 
such as compounds activities, reactions, and so on. 

The information of interactive compounds was downloaded 
from STITCH (chemical_chemical.links.detailed.v3.1.tsv.gz) [9]. 
In the obtained file, each interaction consists of two compounds 
and five kinds of scores entitled 'Similarity', 'Experimental', 
'Database', 'Textmining' and 'Combined_score'. In details, the 
first four kinds of scores are calculated based on the compound 
structures, activities, reactions, and co-occurrence in literature, 
respectively, while the last kind of score 'Combined_score' 
integrates the aforementioned four scores. Thus, it is used in this 
study to indicate the interactivity of two compounds, i.e. two 
compounds are interactive compounds if and only if the 
'combined_score' of the interaction between them is greater than 
zero. In fact, the value of 'combined_score' also indicates the 
strength of the interaction, i.e. the likelihood of the interaction's 
occurrence. Thus, it is termed as confidence score in this study. 
For convenience, we denote the confidence score of the interaction 
between C\ and c 2 by S(c\ ,C2). In particular, if C\ and c 2 are non- 
interactive compounds, «S( c l> c 2) is set to zero. 

1 1 2 drug compounds were investigated in this study as 
described in Section "Materials", and 1,393 chemical-chemical 
interactions whose confidence scores were greater than zero were 
obtained. Among the interactions which scores are greater than 
zero, 50 of them belonged to the label 'Similarity', 4 belonged to 
'Experiment', 1 14 belonged to 'Database', and 1,352 belonged to 
'Textmining'. It is necessary to point out that some drug 
interactions had two or more than two kinds of scores. As far as 
the quantity of chemical-chemical interactions is concerned, the 
tag 'Textmining' contributed most to the construction of the 
prediction method described in Section "The method based on 
chemical-chemical interactions". 

Prediction Method 

The method based on chemical-chemical 
interactions. Systems biology has been applied extensively into 
the predictions of properties of proteins and compounds and is 
deemed to be more efficient than some conventional methods 
[7,20,21,22]. In this study, we attempt to classify cancer drugs into 
the aforementioned 8 categories based on chemical interactions. 

Suppose there are n drugs in the training set S', say 
d\,d2, . . . ,d n . Cancers that <3?,c an treat is represented as follows: 
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Figure 1. The number of drugs plotted against the number of cancers they can treat in the benchmark dataset. 

doi:10.1371/journal.pone.0087791.g001 



Ft = (fijjij, ■ ■ ■ ,fs,i) T 
where T is the transpose operator and 



;=1 

j = 1,2,3,4,5,6,7,8 



(3) 



fu- 



1 if di can treat cancer Cj 
0 Otherwise 



(2) 



For a query drug d q , which cancer it can treat can be determined 
by its interactive compounds in S'. To evaluate the likelihood that 
d q can treat cancer Cj, we calculated a score as follows: 



Larger score of H(dq=>Cj) indicates that it is more likely the 
query drug can treat cancer Cj. And H(dq=>Cj) = 0 suggests that 
the probability that the query drug can treat cancer C ; is zero, 
because there are no interactive compounds in S' that can treat 
cancer Cj. 



Table 1. The number of drugs in each category of S tr , S te , S and S n 





Tag 


Cancer 


Number of drugs 










Training 
dataset S tr 


Validation test 
dataset S te 


Total in S 


Independent 
test dataset S ite 


c, 


Cancers of the nervous system 


8 


1 


9 


1 


c 2 


Cancers of the digestive system 


8 


5 


13 


6 


c 3 


Cancers of haematopoietic and 
lymphoid tissues 


24 


6 


30 


21 


c 4 


Cancers of the breast and female 
genital organs 


13 


6 


19 


11 


c 5 


Cancers of soft tissues and bone 


4 


6 


10 


2 


c 6 


Cancers of the urinary system and 
male genital organs 


9 


5 


14 


9 


c 7 


Cancers of endocrine organs 


5 


2 


7 


1 


c 8 


Cancers of the lung and pleura 


6 


3 


9 


7 




Total 


77 


34 


111 


58 



doi:1 0.1 371 /journal.pone.0087791 .t001 
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As mentioned in Section "Materials", predicting which cancers 
a drug can treat is a multi-label classification problem. A reliable 
classifier should provide not only the most likely cancer but also a 
series of candidate cancers, ranging from the most likely one to the 
least likely one. According to the results of Eq. 3, it is easy to 
arrange the candidate cancers using the decreasing order of the 
corresponding scores. For example, if the results of Eq. 3 are: 



Ii(d q => C 3 ) > n(d q => Ci ) > U(d q => C 5 ) > 0 



(4) 



it means that there are three candidate cancers of d q , where the 
most likely cancer it can treat is C3, followed by C\ and C5. 
Furthermore, C3 is called the 1 st order prediction, and C\ is the 
2 nd order prediction, and so forth. 

The Method Based on Molecular Descriptors 

To compare our method with other methods, the method based 
on molecular descriptors was constructed as follows. The structure 
optimization of each drug compound was performed using the 
AMI semi-empirical method implemented in AMP AC 8.16 [23]. 
454 descriptors including constitutional, topological, geometrical, 
electrostatic, and quantum-chemical descriptors were calculated 
by Codessa 2.7.2 [24]. To encode each drug compound 
effectively, the descriptors with missing values were discarded, 
resulting in 355 descriptors, i.e. each drug compound d can be 
represented by a 355-D (dimension) vector which can be 
formulated as follows: 



D(d) = [suS2, ■■■^355] 



(5) 



where T is the transpose operator. Accordingly, the relationship of 
two drugs di and d 2 can be calculated by the following formula: 



S D (di 4i) -- 



D(d i yD(d 2 ) 



(6) 



where D(d\)-D(d2) is the dot product of D(d\) and D(d\), while 
\\D{di)\\ and \\D(d 2 )\\ is the modulus of D(di) and D(dx\ 
respectively. 

Similar to the method based on chemical-chemical interactions, 
the score that a query drug d q can treat cancer Cj can be 
calculated by the following formula: 



n D (d q ^Cj)=J2f l j-s D (d i ,d q ) 



j =1,2,3,4,5,6,7,8 



(7) 



The rest procedure is the same as that of the method based on 
chemical-chemical interactions, which also provides a series of 
candidate cancers that d q can treat, ranging from the most likely 
one to the least one. 

Validation and Evaluation 

Jackknife test is one of the most popular methods for evaluating 
the performance of classifiers. During the test, each sample is 
singled out one-by-one and predicted by the classifier trained by 
the rest samples in the dataset. The test procedure is open, thereby 
avoiding arbitrary problem [7] . Therefore, the outcome obtained 
by Jackknife test is always unique for a given dataset. In view of 



this, many investigators have adopted it to evaluate the accuracies 
of their classifiers in recent years [25,26,27,28,29]. 

As described in Section "Prediction method", the methods in 
this study can provide a series of candidate cancers for a given 
query drug. TheJ-th order prediction accuracy is computed by the 
following formula [7,8]: 



ACC >=i 



(8) 



where JV is the total number of drugs in the dataset and 9j is the 
number of drugs such that their j-th predictions are the true 
cancers that they can treat. It is obvious that ACCj measures the 
quality of thej-th order prediction. If the true cancers that a query 
drug can treat are positioned in low order, it is deemed as an 
optimal predicted result. Thus, high A CCj with low order number 
j and low ACCj with high order number j indicate a good 
performance of the classifier. ACC\ is the most important 
indicator of the performance of the classifier. 

To evaluate the methods more thoroughly, we calculated the 
prediction accuracy on cancer Cj for the j-th order prediction as 
follows: 



ACC'r- 



(9) 



where jVJ is the number of drugs that can treat cancer Cj in the 
dataset and ojij is the number of drugs such that its 2-th order 
prediction is correctly predicted to treating cancer Cj. 

In addition, another measurement was taken, which was 
adopted in some previous studies [6,7,8] and can be calculated 
as follows: 



N 

E 

i=\ 
N 

E n i 
;=1 



(10) 



where m represents the first m predictions that are taken into 
consideration, S> ;>m is the number of the correct predictions of the 
i-th drug compound among its first m predictions, n t is the number 
of cancers that the i-th drug compound can treat. It is easy to 
deduce that Q m means the proportion of all true cancers that the 
samples in the dataset can treat covered by the first m predictions 
of each sample in it. It can be seen from Figure 1 that different 
drug compounds may have different numbers of cancers they can 
treat. In view of this, the parameter m in Eq. 10 usually takes the 
value of the smallest but no less than the average number of 
cancers that drug compounds in the dataset can treat. It can be 
computed by 



M-- 



N 

E »i 

1= 1 

N 



(11) 



Generally speaking, higher Q m suggests better performance of the 
method. 

Results and Discussion 

As described in Section "Materials", the benchmark dataset S 
was divided into a training dataset S tr and a validation test dataset 
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S te , which contained 59 and 9 drugs, respectively. In addition, an 
independent test dataset Si te containing 44 drugs was constructed 
to test the generalization of the method. The predicted method 
introduced in Section "The method based on chemical-chemical 
interactions" was used to make prediction. The detailed predicted 
results are given as follows. 

Performance of the Method Based on Chemical-chemical 
Interactions on the Training Dataset 

As for the 59 drugs in the training dataset S tr , the predictor was 
performed and evaluated by Jackknife test. Listed in column 2 of 
Table 2 are the 8 prediction accuracies calculated by Eq. 8, from 
which we can see that the 1 st order prediction accuracy was 
55.93%, while the 2 nd order prediction accuracy was 22.73%. It is 
also observed from column 2 of Table 2 that the prediction 
accuracies generally followed a descending trend with the increase 
of the order number, indicating that the proposed method 
arranged the candidate cancers in the training dataset quite well. 
In details, for each order prediction, we calculated the accuracies 
of each kind of cancer according to Eq. 9, which were listed in 
row 2-9 of Table 3. It can be seen that most of the 0.00% 
accuracy occurred when the prediction order was high, indicating 
that for each kind of cancer, it was better predicted with lower 
order number of the predictions. The average number of cancers 
which drugs in S tr can treat was 1.31 (77/59), calculated by Eq. 
11. It means that the average success rate would be only 16.38% if 
ones make prediction by random guesses, i.e. randomly assign a 
cancer indication to each sample, which is much lower than the 1 st 
order prediction accuracy obtained by our method. Because the 
average number of cancers a drug can treat is 1.31, the first 2 
order predictions of each sample in S tr were taken to calculate the 
proportion of true cancers that samples in S tr can treat covered by 
these predictions according to Eq. 10, obtaining a ratio of 
61.04%. 

Performance of the Method Based on Chemical-chemical 
Interactions on the Validation Test Dataset 

As for the 9 drugs in the validation test dataset S te , their 
candidate cancers were predicted by the method described in 
Section "The method based on chemical-chemical interactions" 
based on the information of the drugs in S tr . 8 prediction 
accuracies calculated by Eq. 8 were listed in column 3 of Table 2. 
It can be seen that the I s order prediction accuracy was 55.56%, 
while the 2 nd order one was 66.67%. It is also observed from 
Table 2 that the prediction accuracies of this dataset were 
generally higher than those of the training dataset, due to the fact 

Table 2. Prediction accuracies of the method based on 
chemical-chemical interactions on S tr/ S te and S ite . 



Prediction order 


S,r 




Site 


1 


55.93% 


55.56% 


59.09% 


2 


22.73% 


66.67% 


29.55% 


3 


20.34% 


44.44% 


6.82% 


4 


8.47% 


66.67% 


1 1 .36% 


5 


5.08% 


22.22% 


6.82% 


6 


10.17% 


55.56% 


2.27% 


7 


6.78% 


55.56% 


13.64% 


8 


0.00% 


11.11% 


2.27% 


doi:10.1371/journal 


pone.0087791.t002 







that drugs in S te can treat two or more than two kinds of cancers, 
while most drugs in S tr can only treat one kind of cancers. 
Similarly, we calculated the accuracies of each kind of cancer for 
the 1 st , 2 nd , 8 th order prediction by Eq. 9. Row 10-17 of 
Table 3 listed them. The average number of cancers that drugs in 
S te can treat was 3.78 (34/9), indicating that if ones make 
prediction by random guesses, the average success rate would be 
47.22%, which is significantly lower than the I s * and 2 nd order 
accuracies listed in column 3 of Table 2. This suggests that the 
performance of the method on the validation test dataset is fairly 
good. Since the average number of cancers that drugs in S te can 
treat was 3.78, the first 4 order predictions of each sample in S t( . 
were considered. According to Eq. 10, 61.76% of true cancers 
were correctiy predicted by the first 4 order predictions. 

Performance of the Method Based on Chemical-chemical 
Interactions on the Independent Test Dataset 

The candidate cancers of the 44 drugs in the independent test 
dataset S ite were also predicted by our predictor based on the drug 
information in Sq-. 8 prediction accuracies were obtained and 
listed in column 4 of Table 2, from which we can see that the 1 st 
order prediction accuracy was 59.09%, while the 2 nd order 
prediction accuracy was 29.55%. To better evaluate the method, 
the prediction accuracies on each kind of cancer for the 8 order 
predictions were calculated by Eq. 9 and listed in row 18-25 in 
Table 3. The average number of cancers that drugs in Si te can 
treat was 1.32 (58/44), suggesting that if ones make prediction by 
random guesses, the average success rate would be 16.5%, which is 
much lower than the 1 order prediction accuracy obtained by our 
method. Because the average number of drug indications was 
1.32, the first 2 order prediction of each sample in S ite was 
considered. According to Eq. 10, 67.24% of true cancers were 
correctly predicted by the first 2 order predictions. 

Comparison with other Methods 

To indicate the effectiveness of our method for the prediction of 
drugs cancer indications, some other methods were built to make 
comparison. 

The method based on molecular descriptors described in 
Section "The method based on molecular descriptors" was 
conducted on S tr with its performance evaluated by Jackknife 
test. The 8 prediction accuracies calculated by Eq. 8 were listed in 
column 2 of Table 4, from which we can see that the 1 s order 
prediction accuracy was 41.38%. It is much lower than the 1 st 
order prediction accuracy of 55.93% obtained by the method 
based on chemical-chemical interactions. Also, for drugs in S te and 
S; te , their cancer indications were predicted by molecular 
descriptors on S tr . The prediction accuracies were listed in column 
3 and 4 in Table 4. In details, the 1 st order prediction accuracy on 
S te and Si„. were 55.56% and 44.19%, respectively. Compared 
with the prediction accuracies of 55.56% on S te and 59.09% on 
Si te using chemical interactions, they performed at the same level 
on S te , and chemical interactions are much better than chemical 
descriptors on Si te . In addition, we considered the first 2-order, 4- 
order and 2-order predictions on S tr , S te , and Si te due to the 
average number of cancers that drugs in these datasets can treat. 
The proportion of true cancers that samples in S tr , S„., and S ite can 
treat covered by these predictions were 51.39%, 58.82% and 
49.12%, respectively, which were all lower than the corresponding 
proportions of 61.04%, 61.76% and 67.24%, respectively, 
obtained by the method based on chemical-chemical interactions. 
Therefore, the method based on chemical interactions was 
superior to the method based on molecular descriptors. 
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Table 3. Prediction accuracies on 8 kinds of cancers for each order prediction obtained by our predictor. 





Dataset 


Prediction 
order 


c, 


c 2 


c 3 


C, 


Cs 


Q 


Cy 


Ce 




1 


0.00% 


25.00% 


95.83% 


46.15% 


0.00% 


0.00% 


40.00% 


0.00% 




2 


37.50% 


25.00% 


4.17% 


53.85% 


0.00% 


11.11% 


0.00% 


0.00% 




3 


62.50% 


37.50% 


0.00% 


0.00% 


0.00% 


22.22% 


20.00% 


16.67% 


s„ 


4 


0.00% 


1 2.50% 


0.00% 


0.00% 


0.00% 


22.22% 


0.00% 


33.33% 




5 


0.00% 


0.00% 


0.00% 


0.00% 


0.00% 


0.00% 


0.00% 


50.00% 




6 


0.00% 


0.00% 


0.00% 


0.00% 


50.00% 


44.44% 


0.00% 


0.00% 




7 


0.00% 


0.00% 


0.00% 


0.00% 


50.00% 


0.00% 


40.00% 


0.00% 




8 


0.00% 


0.00% 


0.00% 


0.00% 


0.00% 


0.00% 


0.00% 


0.00% 




1 


0.00% 


0.00% 


83.33% 


0.00% 


0.00% 


0.00% 


0.00% 


0.00% 




2 


0.00% 


0.00% 


16.67% 


83.33% 


0.00% 


0.00% 


0.00% 


0.00% 




3 


100.00% 


0.00% 


0.00% 


1 6.67% 


0.00% 


0.00% 


50.00% 


33.33% 


St» 


4 


0.00% 


80.00% 


0.00% 


0.00% 


33.33% 


0.00% 


0.00% 


0.00% 




5 


0.00% 


0.00% 


0.00% 


0.00% 


1 6.67% 


0.00% 


0.00% 


33.33% 




6 


0.00% 


0.00% 


0.00% 


0.00% 


1 6.67% 


80.00% 


0.00% 


0.00% 




7 


0.00% 


20.00% 


0.00% 


0.00% 


33.33% 


20.00% 


0.00% 


33.33% 




8 


0.00% 


0.00% 


0.00% 


0.00% 


0.00% 


0.00% 


50.00% 


0.00% 




1 


0.00% 


50.00% 


66.67% 


36.36% 


0.00% 


33.33% 


100.00% 


14.29% 




2 


100.00% 


1 6.67% 


14.29% 


36.36% 


0.00% 


11.11% 


0.00% 


42.86% 




3 


0.00% 


1 6.67% 


0.00% 


18.18% 


0.00% 


0.00% 


0.00% 


0.00% 


Site 


4 


0.00% 


0.00% 


9.52% 


9.09% 


0.00% 


22.22% 


0.00% 


0.00% 




5 


0.00% 


1 6.67% 


4.76% 


0.00% 


0.00% 


0.00% 


0.00% 


14.29% 




6 


0.00% 


0.00% 


0.00% 


0.00% 


50.00% 


0.00% 


0.00% 


0.00% 




7 


0.00% 


0.00% 


4.76% 


0.00% 


50.00% 


33.33% 


0.00% 


14.29% 




8 


0.00% 


0.00% 


0.00% 


0.00% 


0.00% 


0.00% 


0.00% 


14.29% 
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As was described in the above three sections, the performance of 
our method was much better than that of the random guesses, 
which randomly assigned a cancer indication to a query drug. 
Here, another random guesses method was applied to evaluate our 
method from a different aspect. For any query drug d q , we 
randomly selected a drug compound in the training set, say d, and 
assigned true cancers that d can treat to d lp i.e. the predicted 
cancers of d q were same as the true cancers that d can treat. Since 
there is no order information in the predicted candidate cancers 

Table 4. Prediction accuracies of the method based on 
molecular descriptors on S tr , S te and S ite . 



Prediction order 


S,r 




Site 


1 


41.38% 


55.56% 


44.19% 


2 


22.41% 


77.78% 


20.93% 


3 


1 8.97% 


55.56% 


1 8.60% 


4 


6.90% 


33.33% 


13.95% 


5 


8.62% 


33.33% 


1 1 .63% 


6 


6.90% 


33.33% 


9.30% 


7 


5.1 7% 


55.56% 


1 1 .63% 


8 


13.79% 


33.33% 


2.33% 



doi:1 0.1 371 /journal.pone.0087791 .t004 



for each sample, the measures provided by Section "Validation 
and evaluation" cannot evaluate the performance of this method. 
Thus, Recall and Precision [30,31] were employed to evaluate its 
performance, which can be computed by. 

( N 

£ TP, 

Precision = — 

(12) 

J2 Tp i 

Recall =i=J — 

where TP, is the number of correct predicted cancers for the z'-th 
drug compound, Rj represents the numbers of cancers which the i- 
th drug compound can treat, P, represents the numbers of 
predicted cancers for the z'-th drug compound, and JV is total 
number of tested samples. 

The random guess method described in the above paragraph 
was conducted on S tr with its performance evaluated by Jackknife 
test. The Precision and Recall were 15.29% and 16.88%, 
respectively. For the predicted results on S tr by chemical-chemical 
interactions, the 1 st order prediction of each sample were picked, 
obtaining Precision of 55.93% and Recall of 42.86%, which were 
much higher than the random guess method. 
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It is easy to see that our method depend deeply on the 
confidence scores of chemical-chemical interactions. To test the 
importance of these scores, we randomly exchanged the 
confidence scores of some interactions. Based on the random 
permutations, the data were evaluated by Jackknife test on the 
training dataset S tr . The 1 st order prediction accuracy was 
23.73%, while the other prediction accuracies of 2 nd , 3 rd ,... ,8 th 
order prediction were 18.64%, 11.86%, 18.64%, 20.34%, 
15.25%, 13.56%, 8.47%, respectively. It is observed that the 1 st 
order prediction accuracy obtained by random permutation was 
much lower than the 55.93% obtained by chemical interactions. 
Furthermore, the 8 prediction accuracies were not followed a 
descending trend with the increase of the order number, indicating 
that the candidate cancers were not arranged well. This implicates 
that confidence scores are very important to the predictions. 

Discussion 

26 1 st order predictions were 'wrong' in the training dataset, 
that is, the predicted cancer indications of these drugs were not 
recorded in KEGG. These 26 drugs and their 1 st order predictions 
were available in Table S3. However, some references reported 
that 23 of these 26 drugs were actually effective to their 'wrong' 
indications, and it was the same with 3 of the 4 drugs in the 
validation test dataset (See Table S3 for the detailed 4 drugs and 
their 1 st order prediction) and 13 of the 18 drugs in the 
independent test dataset (See Table S3 for detailed 18 drugs 
and their I s order prediction). Thus, we hope that our prediction 
model can provide some information of drug repositioning. In the 
following paragraphs, we cited some references to support our 
predicted results. 

Twenty-three Wrong Predicted Pairs of Drug and 
Indication in the Training Dataset 

Cisplatin-Cancers of haematopoietic and lymphoid 
tissues. Cisplatin (KEGG ID: D00275), "penicillin of cancer 
drugs", is widely prescribed for many cancer treatments, such as 
testicular, ovarian, bladder, lung, stomach cancers, and lymphoma 
[32,33,34]. Prasad et al. investigated the effect of cisplatin on the 
Dalton's lymphoma, and concluded that cisplatin can induce 
complete regression of ascites Dalton's lymphoma in mice [35] . 

Ifosfamide-Cancers of haematopoietic and lymphoid 
tissues. Ifosfamide (D00343) can be used to treat germ cell 
testicular cancer, cervical cancer, small cell lung cancer, non- 
Hodgkin's lymphoma, and so on [36]. Extranodal natural killer/ 
T-cell lymphoma, nasal type (ENKL) is Epstein-Barr virus- 
associated lymphoid malignancies, and patients with stage IV, 
relapsed or refractory ENKL have dismal prognoses. Yamaguchi 
et al. explored a new regimen SMILE, including the steroid 
dexamethasone, methotrexate, ifosfamide, L-asparaginase, and 
etoposide, and concluded that SMILE was effective for this kind of 
disease [37,38]. 

Lomustine-Cancers of haematopoietic and lymphoid 
tissues. Lomustine (D00363) is a component of the combina- 
tion chemotherapy for treating primary and metastatic brain 
tumors, and also used as a secondary therapy for refractory or 
relapsed Hodgkin's disease [39]. Moreover, previous studies 
reported that lomustine can be considered for the treatment of 
canine lymphoma in dogs [40,41,42,43], although it induced 
common but not life-threatening toxicity [44]. 

Mitotane-Cancers of the urinary system and male genital 
organs. Mitotane (D00420) is the first-line drug for metastatic 
adrenocortical carcinoma [45,46,47], and also used for the 
adjuvant therapy after removing the primary tumor [48]. 



However, mitotane treatment can induce some side effects, such 
as adrenal insufficiency and male hypogonadism [49]. 

Procarbazine-Cancers of haematopoietic and lymphoid 
tissues. Procarbazine (D00478) is used to treat human leuke- 
mias [50]. MOPP (mechlorethamine, Oncovin, procarbazine, and 
prednisone) is the first combination chemotherapy regimen for 
treating Hodgkin lymphoma (HL) [51]. And BACOPP regimen 
(bleomycin, adriamycin, cyclophosphamide, vincristine, procarba- 
zine, and prednisone) improved both tolerability and efficacy of 
older HLs, although it induced a high rate of toxic deaths [52] . 

Temozolomide-Cancers of haematopoietic and lymphoid 
tissues. Temozolomide (D06067) is an oral alkylating agent 
used for the treatment of anaplastic astrocytoma and glioblastoma 
multiforme [53]. Reni et al. reported that temozolomide was 
effective for immunocompetent patients with recurrent primary 
brain lymphoma, and its toxicity was negligible [54] . 

Thiotepa-Cancers of haematopoietic and lymphoid 
tissues. Thiotepa (D00583) is an alkylating agent to treat 
breast, ovarian, and bladder cancer [55]. A regimen of reduced- 
intensity conditioning with thioteopa, fludarabine, and melphalan 
produced remissions and a limited transplant mortality rate in 
most multiple myeloma patients [56] . Moreover, Kolb et al. 
studied a phase II nonrandomized single-arm trial using TVTG 
regimen (topotecan, vinorelbine, thiotepa, dexamethasone, and 
gemcitabine) for relapsed or refractory leukemia, and reported 
47% response rate of patients and acceptable toxicities [57]. 

Floxuridine-Cancers of the digestive system. Floxuridine 
(D04197) is used to treat hepatic metastases of gastrointestinal 
adenocarcinomas, and also used for palliation of cancers in the 
liver and gastrointestinal tract [58]. Moreover, hepatic arterial 
infusion (HAI) can significandy enhance the antitumor activity of 
floxuridine against colorectal liver metastases, as compared with 
systemic infusion [59]. 

Carboplatin-Cancers of haematopoietic and lymphoid 
tissues. Carboplatin (DO 1363) is approved with less side effects 
compared with its parent compound cisplatin in the clinical 
treatment, and mainly used to treat ovarian, lung, head cancers, 
and so on [34] . Through a phase II trial, Gopal et al. reported that 
GCD (gemcitabine, carboplatin, dexamethasone, and rituximab) 
was a safe and effective outpatient salvage regimen for relapsed 
lymphoma [60]. And Moskowitz et al. also reported that ICE 
regimen (ifosfamide, carboplatin, and etoposide) was effective for 
patients with non-Hodgkin's lymphoma [61]. 

Epirubicin-Cancers of haematopoietic and lymphoid 
tissues. Epirubicin (D02214) is a component of adjuvant 
therapy in patients after resection of the primary breast cancer 
[62]. When used to treat chronic lymphocytic leukaemia, the 
combination of fludarabine and epirubicin achieved a higher 
response rate and a more rapid response, as compared with 
fludarabine alone [63]. 

Gemcitabine- Cancers of haematopoietic and lymphoid 
tissues. Gemcitabine (D01 155) is a nucleoside analog that can 
treat breast, non-small cell lung, and pancreatic cancer [64]. 
Moreover, a regimen including gemcitabine, carboplatin, dexa- 
methasone, and rituximab was reported to be effective for relapsed 
lymphoma [60]. 

Vinorelbine-Cancers of the breast and female genital 
organs. Vinorelbine (DO 1935) is used to treat non-small cell 
lung cancer [65] . Aapro et al. explored the effects of vinorelbine on 
metastatic breast cancer (MBC), and concluded that oral 
vinorelbine was highly effective and well tolerated for patients 
with MBC, no matter a single-agent or in combination with other 
agents [66]. Moreover, vinorelbine was also considered as a 
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promising alternative for older patients with advanced breast 
cancers because of its clinical activity and low side effects [67]. 

Irinotecan-Cancers of the breast and female genital 
organs. Irinotecan (DO 1061) is used to treat metastatic colorec- 
tal cancer and extensive small cell lung cancer [68]. Previous 
studies reported that irinotecan was effective for the refractory 
metastatic breast cancer after anthracyclines or taxanes treatment 
[69,70]. Moreover, the combination of irinotecan and docetaxel 
also achieved a high response rate in pre-treated advanced breast 
cancer patients [71]. 

Capecitabine-Cancers of the breast and female genital 
organs. Capecitabine (DO 1223) is an oral agent used for the 
treatment of metastatic breast cancers, and toxicities are generally 
manageable [72,73,74]. 

Gefitinib-Cancers of the breast and female genital 
organs. Gefitinib (DO 197 7) is used for the continued treatment 
of patients with locally advanced or metastatic non-small cell lung 
cancer after failure of either platinum-based or docetaxel 
chemotherapies [75]. Moreover, gefitinib is the first selective 
inhibitor of the epidermal growth factor receptor (EGFR) tyrosine 
kinase, which controls cell proliferation by activating the Ras 
signal transduction cascade [75]. Thus, gefitinib may be a 
promising agent used for the treatment of metaplastic breast 
carcinoma with frequent expresses of EGFR [76]. 

Sorafenib-Cancers of the lung and pleura. Sorafenib 
(D06272) is a multi-kinase inhibitor by targeting Raf/MEK/ER 
pathway, and approved for the treatment of advanced renal cell 
carcinoma and advanced hepatocellular carcinoma [77]. Blu- 
menschein et al. reported that continuous treatment with sorafenib 
400 mg twice daily helped disease stabilization of patients with 
advanced non-small-cell-lung cancer, which is associated with 
Raf/MEK/ER [78]. 

Paclitaxel-Cancers of the lung and pleura. Paclitaxel 
(D05333) is used for the treatment of Kaposi's sarcoma, lung 
cancer, ovarian cancer, and breast cancer [79]. Hensing et al. 
explored the effects of carboplatin and paclitaxel (C/P) on elderly 
patients with advanced non-small-cell-lung cancer, as compared 
with younger patients. The study indicated that the survival rates 
and quality-of-life of elderly and young groups are not different, so 
C/P should be a reasonable regimen for elderly patients with this 
kind of cancer [80] . 

Dacarbazine-Cancers of the breast and female genital 
organs. Dacarbazine (D00288) is used to treat metastatic 
malignant melanoma and Hodgkin's disease [81]. Moreover, the 
regimen including cisplatin, adriamycin, and dacarbazine was 
reported to be effective for patients with metastatic uterine and 
ovarian mixed mesodermal sarcomas [82]. 

Sunitinib-Cancers of the breast and female genital 
organs. Sunitinib (D06402) is an approved drug for the 
treatment of renal cell carcinoma and imatinib-resistant gastroin- 
testinal stromal tumor [83]. Moreover, previous study reported 
that single-agent sunitinib achieved objective response rate of 1 1 % 
in MBC [84], and the combination of sunitinib and paclitaxel was 
also well tolerated in patients with locally advanced or MBC [85] . 

Flutamide-Cancers of the breast and female genital 
organs. Flutamide (D00586) is an antiandrogen for the man- 
agement of prostate carcinoma [86] . Dimonaco et al. reported that 
flutamide had an inhibitory effect on the growth of rat breast 
cancer [87]. 

Leucovorin-Cancers of the breast and female genital 
organs. Leucovorin (DO 1211) is used to treat osteosarcoma 
after high-dose methotrexate therapy [88]. Moreover, a phase II 
study showed that the regimen of weekly mitoxantrone, 5- 
fluorouracil, and leucovorin (MFL) was well tolerated and 



moderately effective to treat MBC [89]. And a phase 3 trial of 
eniluracil +5-fluorouracil+leucovorin in MBC is also ongoing [90] . 

Goserelin-Cancers of the breast and female genital 
organs. Goserelin (D00573) is a luteinizing hormone blocker, 
and reduces the oestrogen level. Thus, goserelin can improve the 
long-term survival of premenopausal women with early breast 
cancer [91]. 

Fluorouracil-Cancers of haematopoietic and lymphoid 
tissues. Fluorouracil (5-FU, D00584) is used to treat multiple 
actinic and solar keratoses [92] . Takeno et al. reported that a case 
with advanced esophageal cancer accompanying multiple lymph 
node metastases was successfully treated by the combination of 
docetaxel, cisplatin, and fluorouracil [93]. 

Three Wrong Predicted Pairs of Drug and Indication in 
the Validation Test Dataset 

Dactinomycin-Cancers of haematopoietic and lymphoid 
tissues. Dactinomycin (D00214) is an antineoplastic agent, 
which can treat Wilms' tumor and rhabdomyosarcoma [94]. 
However, it is reasonable to assume this compound for the 
treatment of cancers of lymphoid tissues because it induced the 
tumor regression of childhood lymphoma [95] . 

Mitomycin-Cancers of haematopoietic and lymphoid 
tissues. Mitomycin (D00208) is an chemotherapy drug for 
treating cancers of lip, oral cavity, digestive organ, and so on [96]. 
Mitomycin treated a case with localized conjunctival mucosa- 
associated lymphoid tissue lymphoma, and had minimal local 
controllable side effects [97]. Moreover, mitomycin was about 5 
times more potent than porfiromycin (methyl mitomycin) when 
inhibiting the tumor growth in the lymphoma L 1 2 1 0 [98] , but M- 
83 (7-N-(p-hydroxyphenyl)mitomycin) showed significantly higher 
therapeutic activity than mitomycin in lymphoma EL4 [99] . 

Etoposide-Cancers of the breast and female genital 
organs. Etoposide (D04107) is used to treat refractory testicular 
tumors, small cell lung cancer, lymphoma, non-lymphocytic 
leukemia, glioblastoma multiforme, and so on [100]. Poplin et al. 
reported that oral etoposide had a modest activity for chemonaive 
patients with metastatic endometrial cancer, but the minimal 
toxicity of this drug made it possible for the combination 
chemotherapy [101]. Moreover, etoposide was reported to be 
one of the most effective agents for trophoblastic disease [102], 
and the combination of etoposide, ifosfamide/mesna, and cisplatin 
(VIP) appeared to be active in advanced cervical cancer [103]. 

Thirteen Wrong Predicted Pairs of Drug and Indication in 
the Independent Test Dataset 

Diethylstilbestrol-Cancers of the breast and female 
genital organs. Diethylstilbestrol (DrugBank ID: DB00255) is 
used for the treatment of prostate cancer [104]. Moreover, 
Peethambaram et al. reported that diethylstilbestrol was more 
effective than tamoxifen in postmenopausal women with MBC, 
but this treatment was usually associated with toxicity such as 
nausea, edema, vaginal bleeding, and cardiac problems [105]. 

Bleomycin-Cancers of the nervous system. Bleomycin 
(DB00290) is a drug for the palliative treatment of malignant 
neoplasm, such as lung cancers and lymphomas [106]. Moreover, 
Takeuchi et al. reported that bleomycin was effective for the 
patients with gliomas, and the response rate was more than 50% 
[107]. And electrochemotherapy enhanced bleomycin uptake and 
achieved 69% complete elimination of glial cell derived tumor cells 
[108]. 

Bexarotene-Cancers of the lung and pleura. Bexarotene 
(DB00307) is used orally to treat skin manifestations of cutaneous 
T-cell lymphoma in patients after at least one prior systemic 
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therapy [109]. Moreover, bexarotene was effective for preventing 
the growth and progression oflung tumor in mice [1 10], and the 
combination of bexarotene+paclitaxel or bexarotene+vinorelbine 
had significandy greater antitumor effects than the single agent 
[111]. 

Dexrazoxane-Cancers of haematopoietic and lymphoid 
tissues. Dexrazoxane (DB00380) can reduce the incidence and 
severity of cardiomyopathy associated with doxorubicin adminis- 
tration in women with MBC [112]. Moreover, dexrazoxane was 
used as a cardioprotective agent that can attenuate the QT and 
QTc dispersion associated with epirubicin-based chemotherapy in 
patients with aggressive non-Hodgkin lymphoma [113], and 
prevent or reduce cardiac injury associated with doxorubicin 
administration for childhood acute lymphoblastic leukemia 
[114,115]. 

Valrubicin-Cancers of haematopoietic and lymphoid 

tissues. Valrubicin (DB00385) is used to treat bladder cancer 
[116]. Moreover, valrubicin was reported to inhibit the growth of 
leukemia cells [117,118]. 

Zoledronate-Cancers of the breast and female genital 
organs. Zoledronate (DB00399) is used for the treatment of 
patients with multiple myeloma and bone metastases from solid 
tumors when combining standard antitumor therapy [119]. 
Moreover, Steinman et al. reported that zoledronate increased 
disease-free survival in postmenopausal and in premenopausal, 
hormone-suppressed breast cancer patients, but had no antitumor 
effect for premenopausal patients without ovarian suppression 
[120]. 

Pemetrexed-Cancers of the digestive system. Pemetrexed 
(DB00642) is used as a single agent to treat locally advanced or 
metastatic NSCLC after a prior chemotherapy, and also used for 
the treatment of adults' malignant pleural mesothelioma in 
combination with cisplatin [121]. A phase II study reported that 
pemetrexed disodium was effective for patients with advanced 
gastric cancer, and the supplementation of folic acid decreased the 
toxicity with no compromise in efficacy [122]. 

Fluoxymesterone-Cancers of haematopoietic and 
lymphoid tissues. Fluoxymesterone (DB01185) is used for 
the palliative treatment of androgenresponsive recurrent mam- 
mary cancer in postmenopausal women with more than one year 
but less than five years [123]. Moreover, Bai et al. reported that 
fluoxymesterone stimulated the proliferation and differentiation of 
normal erythropoietic burst-forming units that are affected by 
inhibitory factors produced by leukemic cells [124], 

Genistein-Cancers of the lung and pleura. Genistein 
(DB01645) is an experimental agent for the treatment of prostate 
cancer [125]. Moreover, Lian et al. reported that genistein may be 
a promising agent to treat NSCLC because genistein induced 
apoptosis of NSCLC cells by a p53-independent pathway [126]. 

Vorinostat-Cancers of the urinary system and male 
genital organs. Vorinostat (DB02546) is used to treat skin 
manifestations of cutaneous T-cell lymphoma patients with 
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progressive, persistent or recurrent disease on or after two systemic 
therapies [127]. Pratap et al. reported that vorinostat inhibited 
tumor growth and associated osteolysis in the prostate cancer cells, 
but increased normal bone loss [128]. 

Ixabepilone-Cancers of the digestive system. Ixabepilone 
(DB04845) is investigated for the treatment of breast cancer, head 
and neck cancer, lung cancer, and so on [129]. Moreover, 
ixabepilone was reported to be active against advanced or 
metastatic gastric cancers [130,131]. 

Trabectedin-Cancers of the lung and pleura. Trabectedin 
(DB05109) is used to treat soft tissue sarcoma and ovarian cancer, 
and also investigated for the treatment of gastric cancer, and so on 
[132]. Moreover, Massuti et al. reported that trabectedin had 
modest activity in NSCLC patients pretreated with platinum 
[133]. 

Cabazitaxel-Cancers of the breast and female genital 
organs. Cabazitaxel (DB06772) is used for the treatment of 
hormone-refractory metastatic prostate cancer patients pretreated 
with docetaxel [134]. Moreover, Villanueva et al. reported that the 
combination of cabazitaxel+capecitabine was active in patients 
with MBC [135]. 

Conclusions 

In this study, an order-prediction model for drugs and their 
indications was built using the chemical-chemical interaction 
information extracted from STITCH. The outstanding perfor- 
mance of our model implicated that the model was feasible for 
drug-indication prediction, i.e. it was more likely that interactive 
chemicals would treat the same cancers than non-interactive ones. 
Moreover, it was demonstrated that most of the 'wrong' 
predictions might actually right, which may help reposition drugs 
to their new indications according to the prediction results. 
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